Fast Vertical Mining of Sequential Patterns Using Co-occurrence Information
نویسندگان
چکیده
Sequential pattern mining algorithms using a vertical representation are the most efficient for mining sequential patterns in dense or long sequences, and have excellent overall performance. The vertical representation allows generating patterns and calculating their supports without performing costly database scans. However, a crucial performance bottleneck of vertical algorithms is that they use a generate-candidateand-test approach that can generate a large amount of infrequent candidates.To address this issue, we propose pruning candidates based on the study of item co-occurrences. We present a new structure named CMAP (Co-occurence MAP) for storing co-occurrence information. We explain how CMAP can be used to prune candidates in three state-of-the-art vertical algorithms, namely SPADE, SPAM and ClaSP. An extensive experimental studywith six real-life datasets shows that (1) co-occurrence-based pruning is effective, (2) CMAP is very compact and that (3) the resulting algorithms outperform state-of-the-art algorithms for mining sequential patterns (GSP, PrefixSpan, SPADE and SPAM) and closed sequential patterns (ClaSP and CloSpan).
منابع مشابه
VGEN: Fast Vertical Mining of Sequential Generator Patterns
Sequential pattern mining is a popular data mining task with wide applications. However, the set of all sequential patterns can be very large. To discover fewer but more representative patterns, several compact representations of sequential patterns have been studied. The set of sequential generators is one the most popular representations. It was shown to provide higher accuracy for classifica...
متن کاملVMSP: Efficient Vertical Mining of Maximal Sequential Patterns
Sequential pattern mining is a popular data mining task with wide applications. However, it may present too many sequential patterns to users, which makes it difficult for users to comprehend the results. As a solution, it was proposed to mine maximal sequential patterns, a compact representation of the set of sequential patterns, which is often several orders of magnitude smaller than the set ...
متن کاملA new stochastic 3D seismic inversion using direct sequential simulation and co-simulation in a genetic algorithm framework
Stochastic seismic inversion is a family of inversion algorithms in which the inverse solution was carried out using geostatistical simulation. In this work, a new 3D stochastic seismic inversion was developed in the MATLAB programming software. The proposed inversion algorithm is an iterative procedure that uses the principle of cross-over genetic algorithms as the global optimization techniqu...
متن کاملFrom Sequence Mining to Multidimensional Sequence Mining
Sequential pattern mining has been broadly studied and many algorithms have been proposed. The first part of this chapter proposes a new algorithm for mining frequent sequences. This algorithm processes only one scan of the database thanks to an indexed structure associated to a bit map representation. Thus, it allows a fast data access and a compact storage in main memory. Experiments have bee...
متن کاملApproaches for Pattern Discovery Using Sequential Data Mining
In this chapter we first introduce sequence data. We then discuss different approaches for mining of patterns from sequence data, studied in literature. Apriori based methods and the pattern growth methods are the earliest and the most influential methods for sequential pattern mining. There is also a vertical format based method which works on a dual representation of the sequence database. Wo...
متن کامل